ctrfandomcom-20200214-history
Break rotatable bonds and report the fragments
This task is meant to give examples of how to add and delete atoms and bonds from the molecular graph, using the graph API. One of the researchers I worked with asked me for a program which took an input molecule and broke it apart at the rotatable bonds. When a bond breaks, each fragment gets attached to an atom with atomic number 0, that is "*". For a simple example, "c1ccccc1C=O" would become "c1ccccc1*" and "*C=O". "Rotatable bond" was defined as the bond in the following SMARTS [!$(NH!@C(=O))&!D1&!$(*#*)]-&!@[!$(NH!@C(=O))&!D1&!$(*#*)] but a toolkit may have a built-in definition of rotatable bond. Implementation Write a function or method which takes a molecule as input and finds all of the rotatable bonds using either the toolkit's built-in definition or the above SMARTS definition. Delete each bond from the structure and add an atom with atomic number 0 to each of the atoms which was at the end of the bond. The function may modify the molecule in-place or return a new molecule. Use the function to write a program which takes the structure "c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O" (which is the perceived aromatic form of PubChem CID 391758), breaks the rotatable bonds, and prints the fragments out in SMILES form, one per line. The output should be something like *c1(ccc(cc1)O)* *c1(cccc(c1)O)* *CH1(C(=O)N(c2ccccc2C(=N1)(*)*)(*)*)* *CH2(*)(*)* *CH2(*)(*)* *C(=O)(*)O although the specific strings will depend on the toolkit's canonicalization algorithm. If the toolkit can do this with SMIRKS then feel free to show how that works, in addition to a solution which manipulates the graph directly. (Frankly, I haven't figured out how to do it with SMIRKS in OEChem.) OpenBabel/Rubabel require 'rubabel' smarts = "[!$(NH!@C(=O))&!D1&!$(*#*)]-&!@[!$(NH!@C(=O))&!D1&!$(*#*)]" mol = Rubabel"c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O" mol.matches(smarts).each do |atom1, atom2| mol.delete(atom1.get_bond(atom2)) atom2.each do |old_a| mol.add_bond!(old_a, mol.add_atom!(0)) end end OpenEye/Python from openeye.oechem import * rotatable_pat = OESubSearch("[!$(NH!@C(=O))&!D1&!$(*#*)]-&!@[!$(NH!@C(=O))&!D1&!$(*#*)]") def break_rotatable_bonds(mol): # Get the rotatable bonds rotatable_bonds = [] for match in rotatable_pat.Match(mol): rotatable_bonds.extend(match.GetTargetBonds()) # Delete each bond for bond in rotatable_bonds: # Get the atoms it's bonded too then delete the atom bond_atoms = bond.GetBgn(), bond.GetEnd() mol.DeleteBond(bond) # Add the dummy atom to each of the ex-bond's atoms for atom in bond_atoms: new_atom = mol.NewAtom(0) mol.NewBond(atom, new_atom, 1) mol = OEGraphMol() OEParseSmiles(mol, "c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O") OEAssignAromaticFlags(mol) break_rotatable_bonds(mol) print OECreateCanSmiString(mol).replace(".", "\n") SMIRKS version from openeye.oechem import * def main(): ' ' SMIRKS = "[!$(NH!@C(=O))&!D1&!$(*#*):1]-&!@[!$(NH!@C(=O))&!D1&!$(*#*):2]>>*:1*.**:2" molecule = OEGraphMol() OEParseSmiles(molecule, 'c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O') libgen = OELibraryGen(SMIRKS) libgen.SetValenceCorrection(True) libgen.SetExplicitHydrogens(False) libgen.SetStartingMaterial(molecule, 0) for product in libgen.GetProducts(): partcount,partlist = OEDetermineComponents(product) pred = OEPartPredAtom(partlist) for i in range(partcount): fragment = OEGraphMol() pred.SelectPart(i+1) OESubsetMol(fragment, product, pred) print OECreateCanSmiString(fragment), print main() "*C1=NC(C(=O)N(c2c1cccc2)CC(=O)O)Cc3ccc(cc3)O" "*c1cccc(c1)O" "*C1C(=O)N(c2ccccc2C(=N1)c3cccc(c3)O)CC(=O)O" "*Cc1ccc(cc1)O" "*N1c2ccccc2C(=NC(C1=O)Cc3ccc(cc3)O)c4cccc(c4)O" "*CC(=O)O" "*CN1c2ccccc2C(=NC(C1=O)Cc3ccc(cc3)O)c4cccc(c4)O" "*C(=O)O" "*CC1C(=O)N(c2ccccc2C(=N1)c3cccc(c3)O)CC(=O)O" "*c1ccc(cc1)O" RDKit/Python from rdkit import Chem patt = Chem.MolFromSmarts('[!$(NH!@C(=O))&!D1&!$(*#*)]-&!@[!$(NH!@C(=O))&!D1&!$(*#*)]') mol = Chem.MolFromSmiles("c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O") # find the rotatable bonds bonds = mol.GetSubstructMatches(patt) # create an editable molecule, break the bonds, and add dummies: em = Chem.EditableMol(mol) nAts = mol.GetNumAtoms() for a,b in bonds: em.RemoveBond(a,b) em.AddAtom(Chem.Atom(0)) em.AddBond(a,nAts,Chem.BondType.SINGLE) em.AddAtom(Chem.Atom(0)) em.AddBond(b,nAts+1,Chem.BondType.SINGLE) nAts+=2 p = em.GetMol() Chem.SanitizeMol(p) smis = for x in Chem.GetMolFrags(p,asMols=True) for smi in smis: print smi # ---------------------------------- # There's actually an easier way to do this since the BRICS decomposition code has # exactly the function we need: from rdkit.Chem import BRICS bonds = for x,y in bonds p = BRICS.BreakBRICSBonds(mol,bonds=bonds) smis = for x in Chem.GetMolFrags(p,asMols=True) for smi in smis: print smi Cactvs/Tcl set eh create "c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O" prop setparam E_SMILES useexplicith 1 filter create rotbonds property B_ROTATABILITY operator = value1 yes foreach b bonds $eh rotbonds { lassign atoms $eh $b a1 a2 bond delete $eh $b bond create $eh $a1 [atom add $eh *] bond create $eh $a2 [atom add $eh *] } foreach fragsmiles get $eh M_SMILES { puts $fragsmiles } Output: C1(=C(C2=C(C(=C1H)H)N(C(C(N=C2*)(H)*)=O)*)H)H C(H)(H)(*)* C(=O)(OH)* C(H)(H)(*)* C3(=C(C(=C(C(=C3H)H)OH)H)H)* C4(=C(C(=C(C(=C4H)H)H)OH)H)* And here is the requested version using SMIRKS with the provided rotatable bond definition pattern: set eh create "c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O" prop setparam E_SMILES useexplicith 1 set transform {[!$(NH!@C(=O))&!D1&!$(*#*):1]-&!@[!$(NH!@C(=O))&!D1&!$(*#*):2]>>*:1*.*:2*} set eh transform $eh [list $transform forward exhaustive] foreach fragsmiles get $eh M_SMILES { puts $fragsmiles } The output is the same. Using SMIRKS is slower than the first version, though, and it requires toolkit version 3.384 or later. Previous versions ignored non-standard atoms for degree computations, without the option to include query atoms, and since the D attribute is used in the SMIRKS transform, the outcome was not what is expected. Cactvs/Python The first solution scripted in Python looks like this: e=Ens("c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O") Prop.Setparam('E_SMILES','useexplicith',1) f=Filter('rotbonds','property','B_ROTATABILITY','operator','=','value1','yes') for b in e.bonds(f): (a1,a2)=b.atoms() b.delete() Bond(e,(a1,Atom(e,'*'))) Bond(e,(a2,Atom(e,'*'))) for fragsmiles in e.M_SMILES: print(fragsmiles) The output is identical. Category:Cactvs/Tcl Category:editing Category:SMILES Category:SMARTS Category:OpenEye/Python Category:RDKit/Python